Scheduling in Data Intensive and Network Aware (DIANA) Grid Environments

نویسندگان

  • Richard McClatchey
  • Ashiq Anjum
  • Heinz Stockinger
  • Arshad Ali
  • Ian Willers
  • Michael Thomas
چکیده

In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid environment and may result in large processing queues and job execution delays due to site overloads. In this paper we describe a Data Intensive and Network Aware (DIANA) meta-scheduling approach, which takes into account data, processing power and network characteristics when making scheduling decisions across multiple sites. Through a practical implementation on a Grid testbed, we demonstrate that queue and execution times of data-intensive jobs can be significantly improved when we introduce our proposed DIANA scheduler. The basic scheduling decisions are dictated by a weighting factor for each potential target location which is a calculated function of network characteristics, processing cycles and data location and size. The job scheduler provides a global ranking of the computing resources and then selects an optimal one on the basis of this overall access and execution cost. The DIANA approach considers the Grid as a combination of active network elements and takes network characteristics as a first class criterion in the scheduling decision matrix along with computation and data. The scheduler can then make informed decisions by taking into account the changing state of the network, locality and size of the data and the pool of available processing cycles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability

Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...

متن کامل

Bulk Scheduling with DIANA Scheduler

Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine, primarily for data intensive sciences such as physics analysis, are described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allocated to a user necess...

متن کامل

Network and Data Location Aware Job Scheduling in Grid: Improvement to GridWay Metascheduler

Grid Computing has enabled us to utilize the unused computing power (CPU cycles) of computers connected to networks (e.g. Internet). Nowadays, there are lots of scientific projects going on in the domain of High Energy Physics (HEP) and Grid infrastructure constitutes the core computing facility of these projects. One such project is LHC (Large Hadron Collider) deployed at CERN. These experimen...

متن کامل

A Rank-Based Hybrid Algorithm for Scheduling Data- and Computation-Intensive Jobs in Grid Environments

Scheduling is one of the most important challenges in grid computing environments. Most existing scheduling algorithms in grids only focus on one type of grid jobs which can be data-intensive or computation-intensive. However, merely considering one type of jobs in scheduling does not result in proper scheduling in the viewpoint of all system, and sometimes causes wasting of resources on the ot...

متن کامل

LOGOS: Enabling Local Resource Managers for the Efficient Support of Data-Intensive Workflows within Grid Sites

In this study we discuss how to enable grid sites for the support of data-intensive workflows. Usually, within grid sites, tasks and resources are administrated by local resource managers (LRMs). Many of LRMs have been designed for managing compute-intensive applications. Therefore, data-intensive workflow applications might not perform well on such environments due to the number and size of da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0707.0862  شماره 

صفحات  -

تاریخ انتشار 2007